<<<<<<< HEAD Assignment 8

Team Part:

An overall question we would like to investigate in next weeks lab is: what trends are there in animals leaving and entering a humane society and what factors lead to more animals getting adopted?

For our domain expert, we are going to be doing this research for the Austin Animal Shelter in Texas. By analyzing their data, we can help them to determine what factors lead to higher rates of adoption of their animals. We can also see which factors may be applicable to other animal shelters across the country. By discovering the trends present in this data set they will be able to change policy and practices to hopefully increase adoption rates and possibly decrease instances of euthanasia.

The data we will use to answer our questions is a dataset from Kaggle that gives information about thousands of animals admitted to the Austin Animal Center Shelter. The data set describes each animal with details including its age, gender, and condition, and outcome of the animal. This dataset can be found at the following URL: https://www.kaggle.com/aaronschlegel/austin-animal-center-shelter-outcomes-and

shelterdata3<-select(shelter_data33, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata3<- shelterdata3 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata3<- shelterdata3 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
## Warning: Expected 2 pieces. Missing pieces filled with `NA` in 1 rows [31].
shelterdata3<-shelterdata3[-31,]
shelterdata3$Year<-parse_integer(shelterdata3$Year)
shelterdata3$Month<-parse_integer(shelterdata3$Month)
shelterdata4<-as.tibble(select(shelterdata3, age_upon_outcome,animal_type,outcome_type,Sex))
testing6<-as.tibble(filter(shelterdata4,age_upon_outcome!="NULL",outcome_type!="Died",outcome_type!="Missing",outcome_type!="Rto-Adopt",outcome_type!="NA"))

testing7<-testing6%>%
  separate(age_upon_outcome,c("age_num","age_unit")," ")
testing8<-as.tibble(filter(testing7,age_unit!="day",age_unit!="days",age_unit!="month",age_unit!="week",age_unit!="year",age_unit!="weeks"))
testing9<-as.tibble(filter(testing8,outcome_type=="Adoption"))
ggplot(data = testing9,mapping=aes(x=animal_type),na.rm=TRUE)+geom_bar()+facet_wrap(~age_unit)

As we can see from the plot above, one of the major factors influencing adoption rates is age, dependent on species. For cats, there are more adoptions when they are younger, while there are conversely more adoptions for older dogs. This could be due to the fact that older dogs are more likely to be perceived as mild, and could have been trained by previous owners to be well-behaved. Since cats tend to wander outside and mate, there could be a correlation between young cats and adoption simply because there may be more young cats born on the street that become strays. Either way, we see a clear trend to adopt more younger cats and older dogs.

Cats end up in the shelter more in the Summer

animals_train = read_csv("~/Downloads/train.csv")
animals_test = read_csv("~/Downloads/test.csv")
animals_train = animals_train[,-1]
animals_test = animals_test[,-1]
animals <- bind_rows(animals_train, animals_test)

animals$Name[is.na(animals$Name)] <- "Unknown"
animals$HasName <- ifelse(animals$Name == 'Unknown', 0,
                    ifelse(animals$Name != 'Unknown', 1, NA))
animals <- animals %>% mutate(HasName = as.factor(HasName))

animals %>% ggplot(aes(DateTime, fill=HasName)) +
  geom_density(alpha=0.5) + 
  facet_wrap(~AnimalType)

In this plot, we found that cats go missing from there home significantly more often in the summer than any other time of year, while the rate of dogs coming to the shelter are relatively constant throughout the year. This can probably be attributed to the fact that people may have windows and doors open more often in their homes during the summer than any other time of year, making it easier for cats to escape and explore beyond their boundaries. This means that every summer, there is generally a larger influx of cats admitted to the animal shelter. The shelter could use this information to make sure they are equipped to handle more cats in the summer than the rest of the year; on the other hand, they can expect to have a relatively consistent number of dogs in the shelter.

Nameless animals are unloved

library(scales)
ggplot(animals %>% subset(OutcomeType != "Unknown") %>%
         count(HasName, OutcomeType) %>%    
         mutate(pct=n/sum(n)),              
       aes(HasName, n, fill=OutcomeType)) +
       geom_bar(position = "fill", stat="identity") +
       scale_y_continuous(labels = percent_format())

Here we can see that animals that have a known name fare far better than ones with an unknown name. Animals with an unknown name are far less likely to be adopted or returned to their owner and far more likely to die or be euthanised. This is important for animal owners to note, making sure your pet is identifiebly taged will greatly increase your chances of you pet being returned to you should you lose it.

David

Questions:

Import data into R

library(tidyverse)
mydata<-read.csv('aac_shelter_outcomes.csv')
mydata<-mydata[,1:12]

Tidy the data

mydata %>%group_by(animal_type,outcome_type)%>%count()
## # A tibble: 37 x 3
## # Groups:   animal_type, outcome_type [37]
##    animal_type outcome_type        n
##    <fct>       <fct>           <int>
##  1 Bird        Adoption          111
##  2 Bird        Died                4
##  3 Bird        Disposal           22
##  4 Bird        Euthanasia         82
##  5 Bird        Missing             1
##  6 Bird        Relocate            7
##  7 Bird        Return to Owner     9
##  8 Bird        Transfer           76
##  9 Cat         ""                  2
## 10 Cat         Adoption        11621
## # ... with 27 more rows

Create and transform variables

library(ggplot2)
thedata<-subset(mydata,select =c(age_upon_outcome,animal_type,outcome_type))
thedata<-subset(thedata,thedata$outcome_type!="")
thedata$age_upon_outcome<-as.character(thedata$age_upon_outcome)
thedata$age_type<-(strsplit(thedata$age_upon_outcome," "))
thedata$age_type<-as.character(lapply(thedata$age_type,function(x){unlist(x)[2]}))
cat_data<-subset(thedata,thedata$animal_type=="Cat")
cat_data<-na.omit(cat_data)

Create plots of the variables and relationships between variables or create meaningful summaries of the data

ggplot(data=thedata,aes(x=animal_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different types of animals", x = "types of animals", y = "percentage")

ggplot(data=cat_data,aes(x=age_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different age of cats", x = "types of age", y = "percentage")

Findings

For the first question, we can see that the cat and dog have bigger percentage to be adopt, and the cat has bigger percentage to be adopt in the months of age, also we could find that cat has the smallest percentage to be adopt in the days of the month. Therefore, we need to care more about younger cats.

Will

My primary concern is how breed of dog/cats will affect the sex_upon_outcome of animals that enter into the Austin Animal center. We are curious about whether the sex_upon_outcome has a significant different when breed varies.

Then, we will explore which breeds of animals are more likely to enter into the Austin Animal center.

We divide our result as two parts, one deals with dogs and the other deals with cats.

Dogs in Austin Animal Shelter

## # A tibble: 14 x 4
##    breed                     sex_upon_outcome  freq pecentage
##    <chr>                     <chr>            <int>     <dbl>
##  1 Labrador Retriever Mix    Intact Male        409   0.00924
##  2 Australian Cattle Dog Mix Spayed Female      413   0.00934
##  3 Chihuahua Shorthair Mix   Intact Female      443   0.0100 
##  4 Chihuahua Shorthair Mix   Intact Male        466   0.0105 
##  5 German Shepherd Mix       Spayed Female      649   0.0147 
##  6 Pit Bull Mix              Intact Female      682   0.0154 
##  7 German Shepherd Mix       Neutered Male      699   0.0158 
##  8 Pit Bull Mix              Intact Male        804   0.0182 
##  9 Chihuahua Shorthair Mix   Spayed Female     1499   0.0339 
## 10 Labrador Retriever Mix    Spayed Female     1583   0.0358 
## 11 Chihuahua Shorthair Mix   Neutered Male     1789   0.0404 
## 12 Labrador Retriever Mix    Neutered Male     1814   0.0410 
## 13 Pit Bull Mix              Spayed Female     1920   0.0434 
## 14 Pit Bull Mix              Neutered Male     2140   0.0484

After running the R script, we find that the Pit Bull Mix is of most often spayed or neutered: 4.82% is spayed female, and 5.32% is neutered.

## # A tibble: 11 x 2
##    breed                      freq
##    <chr>                     <int>
##  1 Rat Terrier Mix             412
##  2 Catahoula Mix               423
##  3 Miniature Poodle Mix        564
##  4 Border Collie Mix           574
##  5 Boxer Mix                   617
##  6 Dachshund Mix               698
##  7 Australian Cattle Dog Mix   938
##  8 German Shepherd Mix        1724
##  9 Chihuahua Shorthair Mix    4223
## 10 Labrador Retriever Mix     4224
## 11 Pit Bull Mix               5562

From the analysis above, we find that Pit Bull Mix, Chihuahua Shorthair Mix and Labrador Retriever Mix are among the most breeds in the Austin Animal center.

Cats in Austin Animal shelter

For cats,

## # A tibble: 179 x 3
##    breed                    sex_upon_outcome  freq
##    <chr>                    <chr>            <int>
##  1 Domestic Medium Hair Mix Neutered Male      657
##  2 Domestic Shorthair Mix   Intact Female     3381
##  3 Domestic Shorthair Mix   Neutered Male     6109
##  4 Domestic Shorthair Mix   Unknown           2070
##  5 <NA>                     <NA>                NA
##  6 <NA>                     <NA>                NA
##  7 <NA>                     <NA>                NA
##  8 <NA>                     <NA>                NA
##  9 <NA>                     <NA>                NA
## 10 <NA>                     <NA>                NA
## # ... with 169 more rows

Again, we find that the cat Domestic Shorthair Mix are among the most neutered or spayed.

##                      breed  freq
## 3     Domestic Medium Hair   119
## 8             Snowshoe Mix   122
## 1   American Shorthair Mix   209
## 5       Domestic Shorthair   348
## 7              Siamese Mix   888
## 2    Domestic Longhair Mix  1101
## 4 Domestic Medium Hair Mix  2100
## 6   Domestic Shorthair Mix 20996

From the analysis above, we know that Domestic Shorthair Mix, Domestic Medium Hair Mix and American Shorthair Mix are most adopted in the center.

Conclusion

We find that Pit Bull Mix and Domestic shorthair Mix are adopted by the Austin Animal Center mostly. And they are spayed or neutered more than other breeds. Dogs and cats are our friends, they are cute. Everyone that has pets with dogs/cats to treat their pets wisely.

Derek

Tidying the Data

animals_train = read_csv("~/Downloads/train.csv", suppressMessages(TRUE))
animals_test = read_csv("~/Downloads/test.csv", suppressMessages(TRUE))
# Drop 'ID' column (first column)
animals_train = animals_train[,-1]
animals_test = animals_test[,-1]
# merge train and test data together
animals <- bind_rows(animals_train, animals_test)

# This converts the AgeuponOutcome into number of days
animals$TimeValue <- sapply(animals$AgeuponOutcome,  
                      function(x) strsplit(x, split = ' ')[[1]][1])
animals$UnitofTime <- sapply(animals$AgeuponOutcome,  
                      function(x) strsplit(x, split = ' ')[[1]][2])
animals$UnitofTime <- gsub('s', '', animals$UnitofTime)
animals$TimeValue  <- as.numeric(animals$TimeValue)
animals$UnitofTime <- as.factor(animals$UnitofTime)
multiplier <- ifelse(animals$UnitofTime == 'day', 1,
              ifelse(animals$UnitofTime == 'week', 7,
              ifelse(animals$UnitofTime == 'month', 30,
              ifelse(animals$UnitofTime == 'year', 365, NA))))
animals$AgeinDays <- animals$TimeValue * multiplier

# drop now unnessesary columns
animals <- subset(animals, select=-c(TimeValue, UnitofTime, AgeuponOutcome))
# fill in some NAs in the data
animals$Name[is.na(animals$Name)] <- "Unknown"
animals$SexuponOutcome[is.na(animals$SexuponOutcome)] <- "Unknown"
animals$OutcomeType[is.na(animals$OutcomeType)] <- "Unknown"
animals$OutcomeSubtype[is.na(animals$OutcomeSubtype)] <- "Unknown"
# create a new value based on whether the name is known
animals$HasName <- ifelse(animals$Name == 'Unknown', 0,
                    ifelse(animals$Name != 'Unknown', 1, NA))
# lastly convert to factors
animals <- animals %>% mutate(Name = as.factor(Name),
                              OutcomeType = as.factor(OutcomeType),
                              OutcomeSubtype = as.factor(OutcomeSubtype),
                              AnimalType = as.factor(AnimalType),
                              SexuponOutcome = as.factor(SexuponOutcome),
                              Breed = as.factor(Breed),
                              Color = as.factor(Color),
                              HasName = as.factor(HasName))

Findings

animals %>% subset(OutcomeType != "Unknown" & OutcomeType != "Transfer" & AnimalType == "Cat") %>% ggplot(aes(DateTime, fill=OutcomeType)) +
  geom_density(alpha=0.25)

animals %>% subset(AnimalType == "Cat") %>% ggplot(aes(DateTime, AgeinDays)) +
  geom_point(alpha = 0.01) +
  scale_y_log10()
## Warning: Transformation introduced infinite values in continuous y-axis
## Warning: Removed 23 rows containing missing values (geom_point).

I wanted to investigate the odd behavior of cats in the datset, particularly the large increase in cats entering the shelter in the summer compared to the the winter. Especially becuase dogs don’t show these seasonal variations. In the first plot I looked at the distribution of outcomes over time. While return to owner remains relatively constant, we see large spikes in the other three outcomes during the summer months. Especially deaths in the summer of 2015. A quick google search found that Texas was undergoing a large heatwave in July 2015 and I believe many more cats may have dies to heat exposure.

I had two theories as to why cat may turn up more in the shelter in the summer. The first was that owners may be leaving their doors and windows open and cats would escape the house, or was this due to the breading cycle of the cats, which typically give birth in the spring. To do this I plotted the age of cats over time. The horizontal banding is due to the coversion of discrete time increments to linear. The darkest line (and therefore most frequent) is right at the 3 month or 90 day mark. Additionally we see distinct gaps during the winter of young cats, while older cats do not display these same gaps. Based on this I believe the spikes in cats are due to outdoor or feral cats breeding each year. This is a known problem and many states and contries have passed laws requiring neutering of cats to pevent rampant breeding.

Sarah

Tidying the Data

To begin tidying the data, I first removed columns I didn’t feel were necessary for our analysis, including variables like the animal ID and the name of the animal. I then separated the sex and status of whether the animal is neutered or not from one column to two. I then made the datetime column more useful by only including the month and the year and separting the two so that they could be examined separately. The code I used to tidy the data can be found below:

shelterdata1<-select(shelter_data, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata1<- shelterdata1 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata1<- shelterdata1 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
shelterdata1<-shelterdata1[-31,]
shelterdata1$Year<-parse_integer(shelterdata1$Year)
shelterdata1$Month<-parse_integer(shelterdata1$Month)

Findings

I began by investigating different patterns between years at the animal shelter. First, I analyzed the spread of animals through various years in regards to their spayed and neutered status. The spread seems to have remained constant over the past few years, though it is interesting to note that the number of total animals reported increased significantly between 2013 and 2015. This suggests that the animal shelter has expanded and has made changes that allow them to take in more animals

Next, I examined patterns in the various outcomes between each type of animal for each year in the dataset. The first plot shows the outcome type per year, and the second shows the outcome subtype. While many of the outcomes seem to be relatively constant throughout the years, some things that stand out are that there is a massive spike in rto-adoptions for dogs in 2017. The animal might want to examine what changes occurred in 2017 that caused this and continue it in the following years to initiate more rto-adoptions. A similar spike occurs in cats for the same year, though not as large.

Finally, I examined patterns between months for different types of outcomes in different animals. The most notable trend amongst all this is that there seems to be a pattern between birds, cats, and dogs of larger amount of disposal in the beginning of the year.Overall, the transfer program and adoption rates seem consistent on a monthly basis, so the shelter seems to be doing a good job with that.

Abby

To analyze our overall question of which factors lead to animals being more or less likely to be adopted, I decided to analyze how age affects adoption rates. If we can identify patterns about what aged animals are more and less likely to be adopted, then we can make broader conclusions about the factors that influence both adoption rates, and the likely outcome of an animal coming into the shelter.

First, I seperated the age_upon_outcome column to be able to separate among the age levesl of weeks, months, and years. There seems to be an interesting trend among the adoption levels of animal species among the different age categories. For example, among all animal types none were adopted when they were less than 1 month old. Among birds, there were roughly equal amounts of adoptions among months old and years old birds. Among Other types there was a slightly larger amount of animals being adopted at months old and not years old. There was an increase in adoption of older Dogs compared with younger dogs. While there seemed to be a decrease in adoption of cats as they were older.

This implies that among animals entering shelters, they would be more likely to be adopted younger if they are cats, while they are more likely to be adopted older if they are dogs. We can also analyze to see the general trends among species in outcomes.

From the plot above we can see that among birds, there is no trend in outcome. Thus for birds we cannot make accurate predictions on their eventual outcomes once they have entered a shelter. Among Cats, their likely outcomes upon entering the shelter are being adopted followed by transfering to a new shelter. Dogs are most likely to be adopted, followed by being returned to their owners, and then being transfered. Livestock only seem to be transfered or adopted, but never euthanized. Other types of animals are most likely to be euthanized. This illustrates that species is a big factor in the eventual outcome of the animal entering into the shelter.

Summary of Individual Contribution

Sarah: In order to tidy the dataset, I primarily used select separate, and the parse function. I used these to select important columns, break up columns into multiple variables, and turn them into a type that would be useful to use. I plotted many different variables in the dataset over different years and months to look for patterns amongst them so the shelter might be able to reflect on what they did in those years or months that would have caused these patterns, particularly among the outcomes of animals.

David: In this report, I firstly count the number of the different animals with the different outcomes using group_by and count in tidyverse. Then we select variables and we use and transform the age variables to category variables including day,days,month,months,week,weeks,year,years to represent the different age. Finally we create plots of the variables to answer the question.

Derek: I tidyied the dataset by droping uneccessary columns, appropriately converting columns to factors, I fixed NA values by filling in an “Unknown” value. Lastly I created a HasName variable and converted the age of the animal into days. This let me compare ages as a linear values instead of factors. I then looks at whether the absence of a known name affected animal outcome, and then I looked at some of the pecularities of cats during the summer.

Abby: I tidied some of the data Sarah had previously tidied in order to do my analysis. I primarly did this through separate in order to get an age group column to use in my comparisons. I then created a plot to show the adoption counts for each animal type among the different age groups. This allowed me to see how age affected the adoption rates of different species. Finally, I created a plot showing the counts of different outcomes for each animal species. This allowed me to draw more conclusions about the likely outcome of an animal when they first enter the shelter.

Will: For this report, I used the as.tibble,count and order to figure out the breed of cat and dog. In order to calculate how many dogs and cats entering into the shelter in different species. Finally I created the frequency of the different breed of gods that entering into the animal shelter.

East Africa Lake Dataset

From the graph above we can see that, on average, rainfall has decrease for the variables rainfall, charcoal, and BSi. However, for the variables BSi.mar, n, and TEX86 there seems to have been neither an increase nor a decrease in the rainfall. The variable for average rainfall seems to fluctuate, but has generally been decreasing, as the years increase.

======= Assignment 8

Team Part:

An overall question we would like to investigate in next weeks lab is: what trends are there in animals leaving and entering a humane society and what factors lead to more animals getting adopted?

For our domain expert, we are going to be doing this research for the Austin Animal Shelter in Texas. By analyzing their data, we can help them to determine what factors lead to higher rates of adoption of their animals. We can also see which factors may be applicable to other animal shelters across the country. By discovering the trends present in this data set they will be able to change policy and practices to hopefully increase adoption rates and possibly decrease instances of euthanasia.

The data we will use to answer our questions is a dataset from Kaggle that gives information about thousands of animals admitted to the Austin Animal Center Shelter. The data set describes each animal with details including its age, gender, and condition, and outcome of the animal. This dataset can be found at the following URL: https://www.kaggle.com/aaronschlegel/austin-animal-center-shelter-outcomes-and

shelterdata3<-select(shelter_data33, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata3<- shelterdata3 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata3<- shelterdata3 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
## Warning: Too few values at 1 locations: 31
shelterdata3<-shelterdata3[-31,]
shelterdata3$Year<-parse_integer(shelterdata3$Year)
shelterdata3$Month<-parse_integer(shelterdata3$Month)
shelterdata4<-as.tibble(select(shelterdata3, age_upon_outcome,animal_type,outcome_type,Sex))
testing6<-as.tibble(filter(shelterdata4,age_upon_outcome!="NULL",outcome_type!="Died",outcome_type!="Missing",outcome_type!="Rto-Adopt",outcome_type!="NA"))
## Warning: package 'bindrcpp' was built under R version 3.3.3
testing7<-testing6%>%
  separate(age_upon_outcome,c("age_num","age_unit")," ")
testing8<-as.tibble(filter(testing7,age_unit!="day",age_unit!="days",age_unit!="month",age_unit!="week",age_unit!="year",age_unit!="weeks"))
testing9<-as.tibble(filter(testing8,outcome_type=="Adoption"))
ggplot(data = testing9,mapping=aes(x=animal_type),na.rm=TRUE)+geom_bar()+facet_wrap(~age_unit)

As we can see from the plot above, one of the major factors influencing adoption rates is age, dependent on species. For cats, there are more adoptions when they are younger, while there are conversely more adoptions for older dogs. This could be due to the fact that older dogs are more likely to be perceived as mild, and could have been trained by previous owners to be well-behaved. Since cats tend to wander outside and mate, there could be a correlation between young cats and adoption simply because there may be more young cats born on the street that become strays. Either way, we see a clear trend to adopt more younger cats and older dogs.

Enter cat missing during summer plot here!

In this plot, we found that cats go missing from there home significantly more often in the summer than any other time of year, while the rate of dogs coming to the shelter are relatively constant throughout the year. This can probably be attributed to the fact that people may have windows and doors open more often in their homes during the summer than any other time of year, making it easier for cats to escape and explore beyond their boundaries. This means that every summer, there is generally a larger influx of cats admitted to the animal shelter. The shelter could use this information to make sure they are equipped to handle more cats in the summer than the rest of the year; on the other hand, they can expect to have a relatively consistent number of dogs in the shelter.

Enter names being adopted plot here with writeup!

David

Questions:

Import data into R

library(tidyverse)
mydata<-read.csv('aac_shelter_outcomes.csv')
mydata<-mydata[,1:12]

Tidy the data

mydata %>%group_by(animal_type,outcome_type)%>%count()
## # A tibble: 37 x 3
## # Groups: animal_type, outcome_type [37]
##    animal_type outcome_type        n
##    <fct>       <fct>           <int>
##  1 Bird        Adoption          111
##  2 Bird        Died                4
##  3 Bird        Disposal           22
##  4 Bird        Euthanasia         82
##  5 Bird        Missing             1
##  6 Bird        Relocate            7
##  7 Bird        Return to Owner     9
##  8 Bird        Transfer           76
##  9 Cat         ""                  2
## 10 Cat         Adoption        11621
## # ... with 27 more rows

Create and transform variables

library(ggplot2)
thedata<-subset(mydata,select =c(age_upon_outcome,animal_type,outcome_type))
thedata<-subset(thedata,thedata$outcome_type!="")
thedata$age_upon_outcome<-as.character(thedata$age_upon_outcome)
thedata$age_type<-(strsplit(thedata$age_upon_outcome," "))
thedata$age_type<-as.character(lapply(thedata$age_type,function(x){unlist(x)[2]}))
cat_data<-subset(thedata,thedata$animal_type=="Cat")
cat_data<-na.omit(cat_data)

Create plots of the variables and relationships between variables or create meaningful summaries of the data

ggplot(data=thedata,aes(x=animal_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different types of animals", x = "types of animals", y = "percentage")

ggplot(data=cat_data,aes(x=age_type,fill=factor(outcome_type)))+geom_bar(position="fill")+labs(title = "outcomes of entering the shelter for different age of cats", x = "types of age", y = "percentage")

Findings

For the first question, we can see that the cat and dog have bigger percentage to be adopt, and the cat has bigger percentage to be adopt in the months of age, also we could find that cat has the smallest percentage to be adopt in the days of the month. Therefore, we need to care more about younger cats.

Will

My primary concern is how breed of dog/cats will affect the sex_upon_outcome of animals that enter into the Austin Animal center. We are curious about whether the sex_upon_outcome has a significant different when breed varies.

Then, we will explore which breeds of animals are more likely to enter into the Austin Animal center.

We divide our result as two parts, one deals with dogs and the other deals with cats.

Dogs in Austin Animal Shelter

## # A tibble: 14 x 4
##    breed                     sex_upon_outcome  freq pecentage
##    <chr>                     <chr>            <int>     <dbl>
##  1 Labrador Retriever Mix    Intact Male        409   0.00924
##  2 Australian Cattle Dog Mix Spayed Female      413   0.00934
##  3 Chihuahua Shorthair Mix   Intact Female      443   0.0100 
##  4 Chihuahua Shorthair Mix   Intact Male        466   0.0105 
##  5 German Shepherd Mix       Spayed Female      649   0.0147 
##  6 Pit Bull Mix              Intact Female      682   0.0154 
##  7 German Shepherd Mix       Neutered Male      699   0.0158 
##  8 Pit Bull Mix              Intact Male        804   0.0182 
##  9 Chihuahua Shorthair Mix   Spayed Female     1499   0.0339 
## 10 Labrador Retriever Mix    Spayed Female     1583   0.0358 
## 11 Chihuahua Shorthair Mix   Neutered Male     1789   0.0404 
## 12 Labrador Retriever Mix    Neutered Male     1814   0.0410 
## 13 Pit Bull Mix              Spayed Female     1920   0.0434 
## 14 Pit Bull Mix              Neutered Male     2140   0.0484

After running the R script, we find that the Pit Bull Mix is of most often spayed or neutered: 4.82% is spayed female, and 5.32% is neutered.

## # A tibble: 11 x 2
##    breed                      freq
##    <chr>                     <int>
##  1 Rat Terrier Mix             412
##  2 Catahoula Mix               423
##  3 Miniature Poodle Mix        564
##  4 Border Collie Mix           574
##  5 Boxer Mix                   617
##  6 Dachshund Mix               698
##  7 Australian Cattle Dog Mix   938
##  8 German Shepherd Mix        1724
##  9 Chihuahua Shorthair Mix    4223
## 10 Labrador Retriever Mix     4224
## 11 Pit Bull Mix               5562

From the analysis above, we find that Pit Bull Mix, Chihuahua Shorthair Mix and Labrador Retriever Mix are among the most breeds in the Austin Animal center.

Cats in Austin Animal shelter

For cats,

## # A tibble: 179 x 3
##    breed                    sex_upon_outcome  freq
##    <chr>                    <chr>            <int>
##  1 Domestic Medium Hair Mix Neutered Male      657
##  2 Domestic Shorthair Mix   Intact Female     3381
##  3 Domestic Shorthair Mix   Neutered Male     6109
##  4 Domestic Shorthair Mix   Unknown           2070
##  5 <NA>                     <NA>                NA
##  6 <NA>                     <NA>                NA
##  7 <NA>                     <NA>                NA
##  8 <NA>                     <NA>                NA
##  9 <NA>                     <NA>                NA
## 10 <NA>                     <NA>                NA
## # ... with 169 more rows

Again, we find that the cat Domestic Shorthair Mix are among the most neutered or spayed.

##                      breed  freq
## 3     Domestic Medium Hair   119
## 8             Snowshoe Mix   122
## 1   American Shorthair Mix   209
## 5       Domestic Shorthair   348
## 7              Siamese Mix   888
## 2    Domestic Longhair Mix  1101
## 4 Domestic Medium Hair Mix  2100
## 6   Domestic Shorthair Mix 20996

From the analysis above, we know that Domestic Shorthair Mix, Domestic Medium Hair Mix and American Shorthair Mix are most adopted in the center.

Conclusion

We find that Pit Bull Mix and Domestic shorthair Mix are adopted by the Austin Animal Center mostly. And they are spayed or neutered more than other breeds. Dogs and cats are our friends, they are cute. Everyone that has pets with dogs/cats to treat their pets wisely.

Sarah

Sarah Individual

Tidying the Data

To begin tidying the data, I first removed columns I didn’t feel were necessary for our analysis, including variables like the animal ID and the name of the animal. I then separated the sex and status of whether the animal is neutered or not from one column to two. I then made the datetime column more useful by only including the month and the year and separting the two so that they could be examined separately. The code I used to tidy the data can be found below:

shelterdata1<-select(shelter_data, -animal_id, -color, -monthyear, -name, -X13, -X14, -X15, -X16, -X17, -X18)
shelterdata1<- shelterdata1 %>% separate(sex_upon_outcome, into=c("Status","Sex"),sep=" ", fill="left")
shelterdata1<- shelterdata1 %>% separate(datetime, into=c("Year","Month"),sep="-",extra="drop")
shelterdata1<-shelterdata1[-31,]
shelterdata1$Year<-parse_integer(shelterdata1$Year)
shelterdata1$Month<-parse_integer(shelterdata1$Month)

Findings

I began by investigating different patterns between years at the animal shelter. First, I analyzed the spread of animals through various years in regards to their spayed and neutered status. The spread seems to have remained constant over the past few years, though it is interesting to note that the number of total animals reported increased significantly between 2013 and 2015. This suggests that the animal shelter has expanded and has made changes that allow them to take in more animals

Next, I examined patterns in the various outcomes between each type of animal for each year in the dataset. The first plot shows the outcome type per year, and the second shows the outcome subtype. While many of the outcomes seem to be relatively constant throughout the years, some things that stand out are that there is a massive spike in rto-adoptions for dogs in 2017. The animal might want to examine what changes occurred in 2017 that caused this and continue it in the following years to initiate more rto-adoptions. A similar spike occurs in cats for the same year, though not as large.

Finally, I examined patterns between months for different types of outcomes in different animals. The most notable trend amongst all this is that there seems to be a pattern between birds, cats, and dogs of larger amount of disposal in the beginning of the year.Overall, the transfer program and adoption rates seem consistent on a monthly basis, so the shelter seems to be doing a good job with that.

Abby

To analyze our overall question of which factors lead to animals being more or less likely to be adopted, I decided to analyze how age affects adoption rates. If we can identify patterns about what aged animals are more and less likely to be adopted, then we can make broader conclusions about the factors that influence both adoption rates, and the likely outcome of an animal coming into the shelter.

First, I seperated the age_upon_outcome column to be able to separate among the age levesl of weeks, months, and years. There seems to be an interesting trend among the adoption levels of animal species among the different age categories. For example, among all animal types none were adopted when they were less than 1 month old. Among birds, there were roughly equal amounts of adoptions among months old and years old birds. Among Other types there was a slightly larger amount of animals being adopted at months old and not years old. There was an increase in adoption of older Dogs compared with younger dogs. While there seemed to be a decrease in adoption of cats as they were older.

This implies that among animals entering shelters, they would be more likely to be adopted younger if they are cats, while they are more likely to be adopted older if they are dogs. We can also analyze to see the general trends among species in outcomes.

From the plot above we can see that among birds, there is no trend in outcome. Thus for birds we cannot make accurate predictions on their eventual outcomes once they have entered a shelter. Among Cats, their likely outcomes upon entering the shelter are being adopted followed by transfering to a new shelter. Dogs are most likely to be adopted, followed by being returned to their owners, and then being transfered. Livestock only seem to be transfered or adopted, but never euthanized. Other types of animals are most likely to be euthanized. This illustrates that species is a big factor in the eventual outcome of the animal entering into the shelter.

Summary of Individual Contribution

Sarah: In order to tidy the dataset, I primarily used select separate, and the parse function. I used these to select important columns, break up columns into multiple variables, and turn them into a type that would be useful to use. I plotted many different variables in the dataset over different years and months to look for patterns amongst them so the shelter might be able to reflect on what they did in those years or months that would have caused these patterns, particularly among the outcomes of animals.

David: In this report, I firstly count the number of the different animals with the different outcomes using group_by and count in tidyverse. Then we select variables and we use and transform the age variables to category variables including day,days,month,months,week,weeks,year,years to represent the different age. Finally we create plots of the variables to answer the question.

Abby: I tidied some of the data Sarah had previously tidied in order to do my analysis. I primarly did this through separate in order to get an age group column to use in my comparisons. I then created a plot to show the adoption counts for each animal type among the different age groups. This allowed me to see how age affected the adoption rates of different species. Finally, I created a plot showing the counts of different outcomes for each animal species. This allowed me to draw more conclusions about the likely outcome of an animal when they first enter the shelter.

East Africa Lake Dataset

From the graph above we can see that, on average, rainfall has decrease for the variables rainfall, charcoal, and BSi. However, for the variables BSi.mar, n, and TEX86 there seems to have been neither an increase nor a decrease in the rainfall. The variable for average rainfall seems to fluctuate, but has generally been decreasing, as the years increase.

>>>>>>> 17ec5e8da25be33079496f3cb0240c9a38c18c86